Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics
نویسندگان
چکیده
In this paper we describe two new objective automatic evaluation methods for machine translation. The first method is based on longest common subsequence between a candidate translation and a set of reference translations. Longest common subsequence takes into account sentence level structure similarity naturally and identifies longest co-occurring insequence n-grams automatically. The second method relaxes strict n-gram matching to skipbigram matching. Skip-bigram is any pair of words in their sentence order. Skip-bigram cooccurrence statistics measure the overlap of skip-bigrams between a candidate translation and a set of reference translations. The empirical results show that both methods correlate with human judgments very well in both adequacy and fluency.
منابع مشابه
Plagiarism Detection using ROUGE and WordNet
With the arrival of digital era and Internet, the lack of information control provides an incentive for people to freely use any content available to them. Plagiarism occurs when users fail to credit the original owner for the content referred to, and such behavior leads to violation of intellectual property. Two main approaches to plagiarism detection are fingerprinting and term occurrence; ho...
متن کاملA SVM Regression Based Skip-Ngram Approach to MT Evaluation
This paper describes an automatic MT evaluation metric named SNR by Machine Intelligence and Translation Lab. of Harbin Institute of Technology, for NIST MetricsMATR 2008 evaluation. The metric extend the idea of skip-bigram with larger span and multiple statistics. SVM regression method is adopted to tune the weights of statistics in the metric. The experimental results show that SNR correlate...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملUAEMex at ImageCLEF 2016: Handwritten Retrieval
This paper describes the participation of the (UAEMex) at the ImageCLEF 2016 Handwritten Scanned Document Retrieval Task. We propose to use a skip-character text search method based on Longest Common Subsequence. Our system split all characters in query to find all Longest Common Subsequence in one line of text.
متن کاملLongest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays
Essays are used to evalaute student’s knowledge from early before itself. The aim of the proposed system is to evalaute the handwritten essays automatically. The proposed method is to develope an automated system to to evaluate handwritten student essays. For a single topic under study the students may refer more than one study material. According to the similarity of the contents a reference s...
متن کامل